RECONSTRUCTING BIOCHEMICAL CLUSTER NETWORKS 



UTZ-UWE HAUS, RAYMOND HEMMECKE, AND SEBASTIAN POKUTTA 

Abstract. Motivated by fundamental problems in chemistry and biology we study cluster graphs arising 
from a set of initial states S c J, n + and a set of transitions/reactions M c z". The clusters are formed 
out of states that can be mutually transformed into each other by a sequence of reversible transitions. We 
provide a solution method from computational commutative algebra that allows for deciding whether two 
given states belong to the same cluster as well as for the reconstruction of the full cluster graph. Using 
the cluster graph approach we provide solutions to two fundamental questions: 1) Deciding whether two 
states are connected, e.g., if the initial state can be turned into the final state by a sequence of transition 
and 2) listing concisely all reactions processes that can accomplish that. As a computational example, we 
apply the framework to the permanganate/oxalic acid reaction. 



1. Introduction 

The reconstruction of all biochemical reaction networks composed of a given finite set of reac- 
tions/transitions that explain a given overall reaction from an initial state to some final state is one of 
the fundamental problems in biology and chemistry alike. The extraction of this decomposition is essen- 
tial and has a wide range of applications from the design of large-scale reactors in process engineering 
where the presence of unexpected side products can disrupt the reaction process to the derivation of 
rate laws in physical chemistry. 

One initial subproblem to be solved is to decide whether there exists a reaction network at all that 
explains the given overall reaction. For example, it has been settled only recently in [J9l|, that if no 
additional catalyst is available, the 19 species postulated in (cf. 111210 do not suffice to explain the 
permanganate/oxalic acid reaction. So far, automatic tools to reconstruct the reaction networks are 
rare. Even the very promising recent approach using integer programming [ 1 1 J suffered from ad-hoc 
assumptions. Apart from that, there have been considerable advances in the study of biochemical reac- 
tion networks using methods from discrete mathematics and computer algebra which are not directly 
related to our work although they use similar methods (see e.g., QTJ, II14IL II17II ). 

We will formulate the underlying mathematical problem of computing cluster networks and present 
a solution approach from commutative algebra. Given two states, represented as vectors s, t e Z" 
(these could be an initial and a final state of a biochemical reaction) and a set of potential transitions 
M = UU—UUD CZ" where U represents undirected transitions and D represents directed transitions, 
we say s is M-connected to t (short: s — * M t) if there exists a reaction path from s to t in M. We can 
formulate the following questions: 

Question 1. Given two given states s,t€Z". Decide whether s — > M t. 

Question 2. Given two given states s, t e Z" such that s — > M t. Find all directed paths from s to t 
using only transitions in M. 

For small instances, Question 1 and to some extent also Question 2 can be solved using a purely 
enumerative approach. This approach is finite if we assume that M is homogeneous with respect to 
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some positive grading. This road was successfully pursued in Q9] to examine the permanganate/oxalic 
acid reaction. Unfortunately, this approach fails for larger problem sizes due to the vast amount of 
possibilities that have to be enumerated as a consequence of the combinatorial explosion. The situation 
is especially challenging, as due to reversible reactions a large number of reaction paths are similar and 
thus block the view onto structurally different networks. 

In this article, we provide an algorithmic solution to the following two problems. The first problem is 
the identification of equivalence classes (or clusters) of states in Z" induced by the reversible reactions 
U, that is, given u, v e Z", decide whether u <— * u v. The second problem is the construction of the 
graph of all clusters reachable from a given cluster. This cluster graph is the compressed reachability 
graph starting at s e Z" and using only transitions from M, where equivalent states with respect to U 
are contracted. 

We will use techniques from computer algebra in order to transform the problems at hand into al- 
gebraic ones and then, using Grobner bases, provide two algorithms that solve these problems. We will 
then propose a solution to Question 1 and Question 2 using the aforementioned cluster approach. The 
cluster graph that arises from contracting the state graph is usually considerably smaller and both ques- 
tions can be decided on the cluster graph using traditional graph theoretic methods, thus enabling the 
successful processing of significantly larger instances. In particular, transitions connecting two clusters 
are of major importance for any decomposition. They can be easily identified within the cluster graph. 
Thus, being able to compute cluster graphs, one may study more complex reactions whose symmetries 



and equivalent paths block the view onto essential parts of the decomposition, see Figure 2.1 



The outline of the article is as follows. As a motivation, we provide an introduction to the network 
reconstruction problem from a chemical point of view in Section |2j We will introduce the necessary 
notation and establish the link to our mathematical approach using directed graphs and commutative 
algebra. In Section[3]the necessary preliminaries and definitions from a mathematical point of view are 
formulated and basic results are established. We will then derive the described algorithms in Section [4] 
and provide computational results in Section [5] Finally, we conclude with some remarks in Section [6j 

The notation we use is standard (cf. [|4, 11310 . These books also provide excellent introductions to 
commutative computer algebra. 



2. Motivation from chemistry 

The decomposition of an overall chemical reaction into elementary reaction steps is one of the fun- 
damental questions in chemistry, since the network of elementary reactions encodes the dynamic of the 
chemical system and hence allows an analytical examination. We call a reaction elementary if at most 
two species react. For example, 2H 2 — * 2H 2 + 2 is an elementary reaction, but the reverse reaction 
2H 2 + 2 — > 2H 2 is not, as three (not necessarily different) species react. 

The decomposition problem can be solve via several steps. We demonstrate them using the well- 
known permanganate/oxalic acid reaction 

2Mn0 4 + 6H+ + 5H 2 C 2 4 -> 2Mn 2+ + 8H 2 + 10CO 2 

that has been studied since 1866 [8] and that has been studied by several groups (see e.g., HTJ 3 AH, 
H5H. II12II . [11611 . II18II . H19II ). As a first step, we has to choose a set of possible intermediate species that 
can be used to explain the overall reaction. For example, the following 19 species are a commonly 
accepted set | |12 || : 

H 2 C 2 4 HC 2 0" H+ C 2 2 ~ 

Mn 2+ MnC 2 4 MnO" Mn0 2 

Mn 3+ C0 2 H 2 [Mn0 2 ,H 2 C 2 4 ] 

CO" [Mn(C 2 4 )] + [Mn(C 2 4 ) 2 ]" [MnC 2 4 ,Mn0 4 ,H + ] 

[MnC 2 02 + ,Mn0-] + [MnC 2 2+ ,Mn03 ,H+] 2+ [H + ,Mn0 2 ,H 2 C 2 4 ] 
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Now, all possible elementary reactions among the postulated species can be computed. For this, note 
that any chemical state can be represented by a vector in u € Z" , where u ; counts how many times 
species i is present in the state. Moreover, a chemical reaction can be encoded as an integer vector 
deZ" (a difference of two states). In particular, in component d i we specify how many units of species 
i react (if d i < 0) or how many units of species i are created (if d t > 0). As any chemical reaction must 
fulfill a balance of mass and charge, d is a solution of a certain linear system of equations Az = with 
AeZ mx ™ and z e Z". For our example, we obtain 



(2.1) A ■ 



where the first four rows correspond to the mass balance equations of Mn, C, O, and H, respectively. The 
last row encodes balance of charge. The 19 columns correspond to the 19 postulated species. Thus, we 
naturally have a; > for all i e 1, . . . , m — 1, motivated by the fact that each species contains at least 
one atom. The permanganate/oxalic acid reaction is encoded in the vector 

(-5 -6 020 -2 00 10 800000000). 

The set of elementary reactions that we are looking for is now characterized by all integer vectors 
d e ker(A) with ||ci — 1| x < 2. In our example, they can be computed via ( 1 2 9 ) + ( 19 ) + ( 19 ) = 209 linear 
Diophantine systems. In total, they have 1022 solutions. 

Finally, these 1022 reactions can be used to decompose the overall reaction into elementary steps. 
In particular, we are interested in all such possible reaction networks in order to find a network that 
explains the observations from experiments most consistently. From a computational point of view, this 
final decomposition step is by far the most challenging. For example, it was shown in Q9j that the overall 
permanganate/ oxalic acid reaction cannot be decomposed at all using the 1022 elementary reactions 
only. Thus, the 19 species do not suffice in order to explain this overall reaction. A solution was given 
by including a suitable additional species. Unfortunately, using this additional species and the resulting 
additional elementary reactions, the reaction network of the overall reaction contains a huge number 
of paths from the initial state 2MnO~ + 6H+ + 5H 2 C 2 4 to the final state 2Mn 2+ + 8H 2 + 10CO 2 . 
Many of these paths correspond to identical or essentially identical reaction networks. In order to 
identify important reactions within the network, we can cluster states together that are connected 
via a sequence of reversible reactions. Note that we may also cluster according to strongly connected 
components of states but clustering according to reversible reactions preserves more information of 
the original state network. In doing so, a large amount of the combinatorial explosion is removed. The 



resulting coarser network for the example at hand is given in Figure 2.1 This cluster network exhibits 
important elementary reactions needed to explain the overall reaction: For our example it shows that, 
given the postulated species, at most three essentially different reaction networks exist; some of those 
might turn out to be impossible due to chemical restrictions. Moreover, there is a unique reaction 
connecting cluster to cluster 1, showing that this reaction must occur in any decomposition of the 
overall chemical reaction. 



3. The cluster graph framework 

In this section, we want to introduce the necessary notions and definitions. We will also present a 
few fundamental results that we will use in the following exposition. The objects of our interest will be 
directed graphs and in particular their connectivity structure. 

In the following, we will call the elements in Z" states. Let U Q Z" be a set of reversible (undirected) 
transitions, let D c Z" be a set of irreversible (directed) transitions, and let M = U U — U U D be the set 
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Figure 2.1. Cluster graph for permanganate/ oxalic acid reaction. There is a unique 
reaction connecting cluster to cluster 1, showing that it is essential in any decompo- 
sition. 



of transitions. If s e Z" is a state and u e U, then we can transition to state s — u if s — u > or to state 
s + uifs + u>0 — in this sense the transitions in U are reversible. For the irreversible transitions in 
m e D we can (only!) transition tos + mifs + m>0. Thus, we construct the infinite graph T M with 
node set Z" and with an arc from a e Z" to b e Z" if and only if b — a e M. Given two states s, t e Z" , 
we say that s is M-connected to t (short: s — > M t) if there exists a directed path in T M from s to t. 
Moreover, we assume that M is homogeneous with respect to a positive (multi-) grading deg : Z" — > 
This implies that MnZ" = and that there are only finitely many states with a given fixed degree. 
In particular, there are only finitely many states reachable from any given state using the transitions 
from M. We outline this in more detail in the following example. 

Example 3.1. Consider the matrix AG Z 5x19 given in Q2.1| ) which is the mass/charge balance matrix for 
the permanganate/oxalic acid reaction assuming 19 species. 
Adding up the first 4 rows we obtain the vector: 

a l + a 2 + a 3 + a 4 = ( 8 7 1 6 1 7 5 3 1 3 3 11 3 7 13 13 11 12 12 ) , 

which gives a desired positive grading. We can in fact also obtain a positive multi-grading by adding the 
vector a 1 + a 2 + a 3 + a 4 sufficiently often to the rows of A 



8 


7 


1 


6 


1 


8 


6 


4 


2 


3 


3 


12 


3 


8 


14 


15 


13 


14 


13 


10 


9 


1 


8 


1 


9 


5 


3 


1 


4 


3 


13 


4 


9 


17 


15 


11 


14 


14 


10 


8 


2 


6 


1 


7 


5 


3 


1 


3 


5 


13 


3 


7 


13 


14 


11 


13 


15 


12 


11 


1 


10 


1 


11 


9 


5 


1 


5 


4 


17 


5 


11 


21 


21 


18 


19 


18 


32 


29 


3 


22 


2 


28 


19 


12 


1 


12 


12 


44 


13 


27 


53 


52 


43 


46 


47 



Then the multi-grading is obtained by setting deg(u) := Au. 

Now let us partition the graph T M into clusters. This approach was initially suggested in []9D . How- 
ever, in contrast to partitioning via the strongly connected components of Y M as in []9] , we partition via 
equivalence classes which group all states v that can be reached from a specific u using only transitions 
from U. These equivalence classes will be called clusters. Note, that u — >u v if and only if v — * a u. Thus 
the relation u — >y v is reflexive, symmetric, and transitive and therefore indeed an equivalence relation 
on F M and we write u <-»[/ v. By ^(u) c Z" we denote the cluster of u e Z" which is defined in the 
canonical way, i.e., 

^(u):={vez;|u^v!. 

Now, the remaining set D c Z" of irreversible transitions defines a directed graph with the clusters 
(of Z" ) as vertices and with a directed edge from cluster ^(u) to cluster ^(v) if there exists two states 
x e ^(u) and y e ^(v) such that y-xeD. We denote this cluster graph by ^([/,D). Be aware that 

4 



we indeed only add the arc (^(u), ^(v)) if the cluster ^(v) can be reached from the cluster ^(u) by 
a transition path of length 1. 

In applications, we are usually given a finite set S of initial states. Note that in our chemical setting, 
S typically contains only one state. More than one initial state could occur if a set of experiments 
(i.e., pairs of initial and final states) have to be explained (by networks). The nodes of the subgraph 
r = (S, M) of T M reachable from S in T M are 

S = js + J] v ; I k e N, s 6 S, (v ; ) ieW C M s.t. s + £J v; > VZ e [fc] I . 

Herein, we use for convenience [k] := {1, . . . , k} for fc e N. As the arc set M c S X S we obtain the 
set of those arcs in T M involving only nodes from S. The important difference to a classical directed 
graph is now that M and S are not given explicitly but implicitly by a set of transitions/reactions 
M = U U -U U D c Z" \ and a set of initial states S. We call f = (S,M) the state graph of S and 
with respect to M. As S and M might be already very large for small instances it is favorable to not 
completely calculate S and M. In these cases, computing the much smaller cluster graph may still give 
important information on the decompostion of the overall reaction. 

As M is assumed to be homogeneous with respect to some (positive) grading, the state graph of S 
w.r.t. M decomposes for s 1 ,s 2 with different degrees and thus one can confine the analysis to sets S 
with elements of the same degree. 

Having provided the considered setting, we set out to provide algorithmic solutions to answer Ques- 
tions 1 and 2 using cluster graphs. For this we show how to construct the part ^{U , D,S) of the cluster 
graph ^(L/,D) reachable from ^(s) with seS. 

4. Reconstructing cluster graphs 

In this section we will use computer algebraic tools to reconstruct the cluster graph reachable from 
states given in a finite set S via transitions in M = L7 u — U U D. The positive grading implies that 
each cluster contains only finitely many states. Moreover, although the total cluster graph over Z" has 
infinitely many clusters as vertices, the positive grading implies that for any given state seZ" only 
finitely many clusters can be reached from the clusters ^(s) with seS, within the cluster graph. In 
order to reconstruct ^{U , D,S), we have to solve the following two problems: 

Main Problem 1. Given two states s,t e Z" . Decide whether ^(s) = ^(t), i.e., whether s <^> v t. 

Main Problem 2. Given s e Z" . List all transitions d e D that are applicable to at least one state 
s d e ^(s), that is, s d + d > 0. 

Main Problem 1 captures the problem of being able to decide whether two states are in the same 
equivalence class whereas Main Problem 2 has to be solved in order to identify clusters reachable from 
a given cluster. We will provide a solution for both problems in form of computer algebraic algorithms. 
If the clusters are small enough such that all states can be enumerated explicitly, Main Problem 1 can be 
solved by simple enumeration. However, the sizes of the clusters may prohibit an explicit enumeration. 
Therefore an approach is sought that does not explicitly enumerate the cluster elements. We will tackle 
this problem by transforming the set U into a binomial ideal. We consider the polynomial ring K[X] 
where K is an arbitrary field and X is the set of the variables. Let J v be defined to be the ideal 

j v ■- (x a+ -x u ~ : u e L7^ c K[X]. 

Then y <— * u z if and only if x y — x z e J v as shown in the following theorem. This theorem has been 
rediscovered many times — an early reference is II15II . For the sake of completeness we include a proof 
below (cf. ITTOll). 

5 



Theorem 4.1. Let U c Z" and y, z e Z£. Then y z if and ordy if x y - x z e J y . 

Proof. Let y, z e Z" such that y^z. Thus, there exists a path from y to z in Z" . More specifically, 
there exists a sequence of points (pi,P2, —,Pk) — ^+ where p 1 = y, p k = z, and p ; — p i+1 = S^U; for 
some U; e [7 and 5; e {1, —1} for i = 1, fc — 1. Thus, there exists y t e N" such that p ; = (<5 ; Uj) + + y { , 
and p i+1 = (c^u,) - + fi for every i = 1, fc — 1. Hence, x Pi — x Pi+1 = <5 ; x ri (x u i — x u > ), and therefore, 

fc-l k-l 



^~>, _ x P i+ i = ^ 5 ; x^(x< - x u r) e J L 

i=l i=l 



as required. 

Conversely, assume that x y — x z e Jy. Further, suppose for contradiction, y */*u z. As x y — x z e Jy, 
we may write x y — x z = 2/=i c ; xr ' ( xU ^ — x " ; ) where U; e [/, c ; e iC, and 7^ e N". Note that we allow 
U; = u,- for i ^ j. Now, let I := {i e [d] | (f ; + lit) y}. Clearly, (r; + up e Z£ for all is [d]. 
Note that if (ft + u i~) Y then (f ; + up <— ^ y since (7; + up - (7; + up -u t e U. Thus, the 
set of monomials consisting of x r 'x u > and x r 'x u ; for all i e J, which includes x y and not x z , is disjoint 
from the set of monomials consisting of x ri x u ; and x ri x u * for all t ^ I, which includes x z and not 
x y . Let f(x) = c i x ri (x u i f - x u > ) and let g(x) = - c i x ri (x u ' f - x u ' ). It is readily seen that the 
polynomials / (x) and g(x) have a disjoint set of monomials, and therefore, /(x) = x y and g(x) = x z 
since x y — x z = _f (x) — g(x). However, this is impossible since /(l) = and g(l) = but l y = 1 and 
l z = l. □ 



Using Theorem 4.1 we can now solve Main Problem 1 as follows. Choose -< to be an arbitrary term 



ordering, let G_<(JP be a Grobner basis of J v with respect to -<, and let y, z e Z" be two states. Then 



y <— > v z if and only if x y — x z e J y by Theorem 4.1 As G x (Jp is a Grobner basis, x y — x z G Jy if and 



only if NF_,(x y — x z , G) = 0, where NF_ < (.) is the normal form operator. Note that once the Grobner basis 
G_.(Jp of J v has been calculated, the membership test NF_,(x y — x z , G) = can be performed easily. 

As an immediate consequence of this construction, it follows that every cluster ^(s) with s e Z" 
has a unique (-<-minimal) representative outside the leading term ideal LT_,(JP which can be obtained 
by calculating the normal form. Slightly abusing notation we denote this unique minimal element by 
^(s) := NF_,(x s , G). Thus, the monomials outside of LT_<(JP are in one-to-one correspondence with 
possible clusters in Z" . Consequently, the number of clusters reachable from ^(s) is bounded by the 
number of monomials outside of LT_<(JP which have the same (multi-) degree deg(s). These numbers, 
however, are encoded in the multi-graded Hilbert-Poincare series oiJ u : 

•*? (z) := Mz,Ju, deg) := £ z de ^ a \ 

Note that this potentially infinite sum can always be written as a rational function p(z)/q(z). Moreover, 
it does not depend on the actual term ordering -< chosen and it can be computed from any Grobner basis 
of J v rather efficiently. The coefficient in front of z deg( - s - ) in the multi-variate Hilbert series expansion of 
p(z)/q(z) is exactly the number of clusters of degree deg(s) G Z . Thus, we can compute in advance an 
upper bound on the number of clusters that can be reached from ^(s) or that can reach ^(s) inside 
the cluster graph ^{U, D). 

Recall that for a directed transition deD and a state z e Z" the transition d is applicable to z, 
if z + d e Z" . Thus, in order to solve Main Problem 2, we need to decide whether a given directed 
transition d e Z" is applicable to some state z e ^(y) given only some representative y of the cluster 
^(y). The following theorem will be a main ingredient in order to solve this question. Recall that for an 
ideal I C K [X] and a vector m e Z" the colon ideal I : x m is defined as I : x m := {p S K [X] \ px m e I}. 
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Figure 4.1. Shift relative to d. 



Theorem 4.2. Let U c Z", d e and y,z e Z![ with y, z > d. Moreover, let V c Z" such that 
J v =J V : x d . Then 

y^uz <=> y-d^yz-d. 



Proo/ This follows immediately using Theorem 4.1 : Note that y <— >,j z if and only if x y — x z e J v and 



similarly, y — d <— > v z — d if and only if x y_d — x z_d e J y . Now observe that x y_d — x z_d e J v holds if 
and only if x y — x z e Jy, as J y = : x d = ^x u -x v :u + d«-> D v + dj. 

Observe that for the backward implication of the equivalence, we employ the condition y, z > d, that 
is, x y — x z is indeed a polynomial in x. □ 



Note that Theorem 



4.2 



shows that V connects any two points y, z in ^(y) with y, z > d via a sequence 
of points in ^(y) whose coordinates are also at least as big as d . Such a path can be constructed by 



shifting the relation y — d<— > v z — dbydas shown in Figure 4.1 Moreover, by computing the normal 
form of y with respect to a -<-Gr6bner basis of J v where -< is a term ordering that maximizes the j-th 
component, we can decide for a given d := d + d ; e ; whether there is some z e ^(y) with z > d. This is 
summarized in Algorithm [T] 



Input : Let U c Z", y,de Z" . Moreover, let S = supp(d), ;' e S, and d := d - d ; e ; -. 
Output: z if z e ^(y) with z ; - > d ; - exists, False otherwise. 

1 set d := d — d ; e ; -; 

2 compute V e Z" such that J v = J v : x d = ^"-x'lu + d^v+d); 

3 compute a Grobner basis G of J v w.r.t. a term ordering -< that maximizes the j-th component; 

4 x z :=NF^(x y - d ,G); 

5 if %j > dj then 

6 z :=z + de ^(y); 

7 return z; 

8 else 

9 return False; 
to end 

it end 



Algorithm 1: Coordinate increment algorithm (CI). 



We will now formulate Algorithm [2] that solves Main Problem 2 by iteratively calling the coordinate 
increment algorithm (Algorithm [TJ). Algorithm |2j constructs an element z e ^(y) with z + d e Z" for a 
given yeZ" and d e Z" if such an element z exists. 



Input : Let U,y, d be as in Theorem |4.3 

Output: ze^(y) with z + d e Z" if exists, False otherwise. 

1 set S := supp(d~); 

2 let . . . ,s k } := S be an enumeration of S with k<n; 

3 set d w := d~e s : 

4 set z(°) := y; 

5 for i e [fc] do 

6 set z (i) := CI(C/, z (i_1) , d (i) ,S;); (with CI from Theorem 

7 if z^ == False then 

8 return False; 

9 STOP; 
10 end 

n if i < k then 

12 set d (;+1) :=d w + d" e s ; 

13 end 

14 end 

15 return z^; 
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Algorithm 2: Cluster connectivity test (CCT). 



Theorem 4.3. Let L7 c Z" y e Z", and d e Z". Then Algorithm^jdecides whether there exists z e ^(y) 
with z + d e Z" and if the answer is affirmative it returns z. 

Proof. Let S, d w , z w , and fc be as calculated by Algorithm 2j Suppose that z e ^(y) with z + deZ" 
exists. It suffices to show that at the end of iteration i we obtain an element z^ e ^(y) with > d~ 

Sj Sj 

for all j e [i]. Then for i = k = | supp(d~)| the assertion follows as > d~ for all j e [k] and hence 
z w + dez;. 

We have to verify that in iteration i we can apply Algorithm [l] to the input U, z*- 1-1 ^ d^ l \ and s ; . At 
the start of iteration i we have z 1 - 1-1 ^ > d*- 1 -* — d~e, . In the case of i = 1 this is clear as d*- 1 -* — d~e„ = 

and z (0) = y e TL\. For i > 1 this follows by induction as z (i_1) := CI([/, z (; " 2;) , d^" 1 ^;.!) with z (i_l3 > 
d^ 1-1 - 1 if such an element z^ 1-1 ^ e ^(y) exists. With d^ 1-1 - 1 = d^ — d~e,. the claim follows. Thus we 

can indeed apply Algorithm [l] to the input U, z^ l ~ J \ d®, 5; and it returns z^ with z^ > d^ if such 
an element z^ 1 -* e ^(y) exists, so that the loop is well defined and we obtain > d^ at the end of 
each iteration. Let i e [k] and observe that > d~ for all j e [i] so that, together with z*^ > d^, it 

Sj Sj 

follows z t w > d" for all € [i]. 

Sj Sj 

In case there is no such z e ^(y) with z + d e Z" , there exists i G [k] such that CI([/, z^ l_1) , d w ,s ; ) 
returns 'False' and the algorithm stops. This finishes the proof. 

□ 

Remark 4.4. Note that in our chemical setting, | |d~ 1 1 1 < 2. Therefore, in order to construct some z e ^ (y) 
such that z + de Z" , we have to compute J v : x t only with respect to at most one variable x t for each d. 



Clearly, one can compute suitable Grobner bases for J v and for J v : x ; with i e [n] in advance and reuse 
them when constructing the cluster graph. Moreover, note that a generating set of J v : x t can be computed 
from J v via a DegRevLex-Grdbner basis computation with an ordering such that x t is minimal (that is, 
that maximizes the i-th component), and then removing x t from any binomial by division where possible. 
The resulting generating set of J v : x t is, in fact, still a Grobner basis of J v : x ; that maximizes the i-th 
component. 



In order to apply Theorem 4.3 we have to compute a Grobner basis ofJ v : x t with respect to a DegRevLex 
term ordering that maximizes some other component j. As we have already computed a Grobner basis of 
J v : x t , we might even use a Grobner walk to compute the desired DegRevLex-Grdbner basis G i; of J v : x t 
that maximizes the j-th component. 

Finally, note that also all the Grobner bases G i; - for which there is some d e D with supp(d~) = {i,j} 
can be computed in advance to constructing the cluster graph. The construction of the cluster graph reduces 
to computing normal forms with respect to these Grobner bases then. 

We will now present an algorithm that reconstructs the cluster graph ^{U , D,S). 

Lemma 4.5. Let s e Z" be an initial state and M = U U -U U D c Z n be as above. Then Algorithm |5] 
reconstructs the induced cluster graph ^{U , D,S). 

Proof. The list N contains all the nodes that have been discovered but not visited yet. The list V contains 
all visited nodes and E is the edge set that we construct successively. Clearly Algorithm [3] is finite if the 
cluster graph ^{U , D, S) is finite: In each iteration of the loop starting in 5 we remove one element from 
N and we only add nodes to N (lines 15/16) if the constructed node is new. It remains to show that the 
algorithm reconstructs the cluster graph ^{U, D, S). We initialize N with NF(s, G v ) and we successively 
process all the nodes in N. For each node u £ JV we have to decide whether a directed transition d 
can lead to a new adjacent cluster. Therefore we employ Algorithm [2] to construct v := CCT(t7, u, d) 
if such a v exists. In a next step we verify that v + d ^ ^(u). In this case we found a transition 
d leading to an adjacent cluster and we add the corresponding edge (u,w) to E. Finally we test if 
w := NF(v+ d, Gy) ^ V, i.e., we discovered a new cluster and add w to V in this case. □ 

We will now explain how the cluster approach can be used to answer Question 1 and Question 2 
mentioned in Section FT} Let s e Z" and M = [/U-fJUD c z" beas above and let <3(U,D,S) be the 
cluster graph computea by Algorithm |3j 

Remark 4.6 (Answer to Question 1). We can answer Question 1, i.e., given two states s,te Z", decide 
whether s — > M t. It suffices to calculate, e.g., the shortest path from s to t in ^(U,D,S). If such a path 
exists, then we have s — > M t Otherwise such a path does not exist. 

As the cluster graph is usually considerably smaller than the state graph, this test can be performed 
even for larger reaction networks. If one is interested in a specific path from s to t which, e.g., minimizes 
the energy needed for the reaction, we can attach the required energy as weights to the edges and then 
use the shortest path algorithm to compute the desired path. In order to answer Question 2, we have 
to define exactly when we consider two paths essentially different: 

Definition 4.7. Let s,teZ" be two states such that s — > M t Further let (Vf),- g [ n ],(w t t - e [ m ] c S be two 
(directed) paths with Vi = w 1 = s and v n = w m = t Then (Vj) t - e [ n ],(Wj)i e [ m ] are essentially different if 
the induced paths in ^(U,D,S) are different. 

With the definition as above, Question 2 can be answered as follows. 

Remark 4.8 (Answer to Question 2). Let s, t e Z" be two states such that s — > M t Enumerate all paths 
from s to t in ^{U , D, S) by performing a depth-first search on ^{U, D, S). 

Usually, enumerating all potential paths in a graph is rather expensive (actually already path count- 
ing is #P-complete). The reduction to the cluster graph ^(L/,D,S) reduces the problem size though, so 
that realistic networks can be tackled. 
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Input : Let U, D, s be as in Lemma |43j 




Output: Cluster graph <g(U,D,S). 


1 


1 p1" Cttt hp a Crrohnpr hasis of T TT wrt to an arhitrarilv rhospn fprm nrrlprinp* 


2 
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10 
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12 
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14 


set F •= FuJfu wll" 


15 


if w 6^ V thpn 

11 VV V- V L11V--11 


16 


set N :=N U {w}; 


17 


set V := V U {w}; 


18 


end 


19 


end 


20 


end 


21 


end 


22 


return (V, £); 


23 


end 



Algorithm 3: Cluster graph reconstruction (CGR). 



5. Computational results 



We will now provide some computational results for Algorithm [3j All computations were performed 
in CoCoA 4.7 (see []3j ) on a machine with a dual core x86_64 processor with 2Ghz and 2GB of main 
memory. For the permanganate/oxalic acid reaction and the corresponding mass/charge balance equa- 



tions pJ] ) we obtain 1022 elementary reactions M = UU-UUD.We had to construct 19-18 + 1 = 343 
Grobner bases for the cluster graph reconstruction. All the necessary Grobner bases computations for 
this system were performed in less than 2 min (lm53s) and the remaining time for the actual cluster 
graph reconstruction can be neglected (<ls) as only normal form computations were performed. The 
Grobner basis of Jy contained 136 elements and the Grobner bases of the corresponding ideals J v : X; 
were of similar size. 

As shown in [9], 16 species do suffice to explain the reaction process. We performed the same 
computations for the reduced matrix and obtained both, significantly reduced computational time and 
size of the Grobner bases. In this case the 16-15 + 1 = 241 Grobner bases computations were performed 
in less than 14 sees and the Grobner basis of J v contained 29 elements. The Grobner bases of the ideals 
J u : X; were again of similar size. Again, the time for computing the cluster graph, see Figure 2.1 is 
neglectable (<ls). 

We were also able compute a Grobner basis of Jy for a large reaction network from an air pollution 
model involving 80 species and 13556 reversible elementary reactions. The Grobner basis of J v could 
be computed in less than 5 min (4m48s) and contained 5740 elements. 
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6. Concluding remarks 



We have given algorithms to construct the induced subgraph <3(U,D,S) of ^{U ,D) that is reach- 
able from a given finite set S of states. From ^{U , D,S) we can extract useful information about the 
decomposition of the overall reaction. For example, transitions can be identified that have to occur in 
any decompostion. Our computations for the permanganate/ oxalic acid reaction and the air pollution 
problem indicate that our algorithms are applicable in practice even for real-world size problems with 
a large number of species. Therefore, we are confident that our cluster network approach will turn out 
very useful in the study of more complex biochemical reaction systems. 
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